18 research outputs found

    Public Transit Arrival Prediction: a Seq2Seq RNN Approach

    Full text link
    Arrival/Travel times for public transit exhibit variability on account of factors like seasonality, dwell times at bus stops, traffic signals, travel demand fluctuation etc. The developing world in particular is plagued by additional factors like lack of lane discipline, excess vehicles, diverse modes of transport and so on. This renders the bus arrival time prediction (BATP) to be a challenging problem especially in the developing world. A novel data-driven model based on recurrent neural networks (RNNs) is proposed for BATP (in real-time) in the current work. The model intelligently incorporates both spatial and temporal correlations in a unique (non-linear) fashion distinct from existing approaches. In particular, we propose a Gated Recurrent Unit (GRU) based Encoder-Decoder(ED) OR Seq2Seq RNN model (originally introduced for language translation) for BATP. The geometry of the dynamic real time BATP problem enables a nice fit with the Encoder-Decoder based RNN structure. We feed relevant additional synchronized inputs (from previous trips) at each step of the decoder (a feature classically unexplored in machine translation applications). Further motivated from accurately modelling congestion influences on travel time prediction, we additionally propose to use a bidirectional layer at the decoder (something unexplored in other time-series based ED application contexts). The effectiveness of the proposed algorithms is demonstrated on real field data collected from challenging traffic conditions. Our experiments indicate that the proposed method outperforms diverse existing state-of-art data-driven approaches proposed for the same problem

    RNA motif discovery: a computational overview

    Get PDF
    Genomic studies have greatly expanded our knowledge of structural non-coding RNAs (ncRNAs). These RNAs fold into characteristic secondary structures and perform specific-structure dependent biological functions. Hence RNA secondary structure prediction is one of the most well studied problems in computational RNA biology. Comparative sequence analysis is one of the more reliable RNA structure prediction approaches as it exploits information of multiple related sequences to infer the consensus secondary structure. This class of methods essentially learns a global secondary structure from the input sequences. In this paper, we consider the more general problem of unearthing common local secondary structure based patterns from a set of related sequences. The input sequences for example could correspond to 3 ′ or 5 ′ untranslated regions of a set of orthologous genes and the unearthed local patterns could correspond to regulatory motifs found in these regions. These sequences could also correspond to in vitro selected RNA, genomic segments housing ncRNA genes from the same family and so on. Here, we give a detailed review of the various computational techniques proposed in literature attempting to solve this general motif discovery problem. We also give empirical comparisons of some of the current state of the art methods and point out future directions of research

    Statistical significance of episodes with general partial orders

    No full text
    Frequent episode discovery is one of the methods used for temporal pattern discovery in sequential data. An episode is a partially ordered set of nodes with each node associated with an event type. For more than a decade, algorithms existed for episode discovery only when the associated partial order is total (serial episode) or trivial (parallel episode). Recently, the literature has seen algorithms for discovering episodes with general partial orders. In frequent pattern mining, the threshold beyond which a pattern is inferred to be interesting is typically user-defined and arbitrary. One way of addressing this issue in the pattern mining literature has been based on the framework of statistical hypothesis testing. This paper presents a method of assessing statistical significance of episode patterns with general partial orders. A method is proposed to calculate thresholds, on the non-overlapped frequency, beyond which an episode pattern would be inferred to be statistically significant. The method is first explained for the case of injective episodes with general partial orders. An injective episode is one where event-types are not allowed to repeat. Later it is pointed out how the method can be extended to the class of all episodes. The significance threshold calculations for general partial order episodes proposed here also generalize the existing significance results for serial episodes. Through simulations studies, the usefulness of these statistical thresholds in pruning uninteresting patterns is illustrated. (C) 2014 Elsevier Inc. All rights reserved

    Parametric Localization of Correlated Incoherently Distributed Sources Using ESPRIT

    No full text
    Most of the distributed source algorithms discussed in the literature have restricted themselves to an uncorrelated distributed source scenario. The fundamental question asked was to whether the ESPRTT based algorithm for the localization of uncorrelated incoherently distributed (ID) sources could be applied to a correlated TD source scenario. For this, we assumed the existence of a specific form of the angular cross correlation kernel, which resulted in the block diagonal structure of the effective source covariance matrix. ESPRIT can be applied in principle as long as each of these non-zero blocks is invertible. We could check both analytically and by simulations that the algorithm could be used in a two uniform incoherently distributed source scenario with this specific non-zero angular cross correlation kernel

    Discovering frequent chain episodes

    No full text
    Frequent episode discovery is a popular framework in temporal data mining with many applications. An episode is a partially ordered set of nodes with each node associated with an event-type. The episodes literature has seen different notions of frequency and a variety of associated discovery algorithms under these different frequencies when the associated partial order is total (serial episode) or trivial (parallel episode). Recently an apriori-based discovery algorithm for mining episodes where the associated partial order has no restriction but the node to event-type association is one-one (general injective episodes) was proposed based on the non-overlapped frequency measure. This work pointed out that frequency alone is not a sufficient indicator of interestingness in the context of episodes with general partial orders and introduced a new measure of interestingness called bidirectional evidence (BE) to address this issue. This algorithm discovers episodes by incorporating both frequency and BE thresholds in the level-wise procedure. In this paper, we extend this BE-based algorithm to a much larger class of episodes that we call chain episodes. This class encompasses all serial and parallel episodes (injective or otherwise) and also many other non-injective episodes with unrestricted partial orders. We first discuss how the BE measure can be generalized to chain episodes and prove the monotonicity property it satisfies in this general context. We then describe our candidate generation step (with correctness proofs) which nicely exploits this new monotonicity property. We further describe the frequency counting (with correctness proofs) and BE computation steps for chain episodes. The experimental results demonstrate the effectiveness of our algorithms
    corecore